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[57] ABSTRACT 

A system and method provides universal access to voice- 
based documents containing information formatted using 
MIME and HTML standards using customized extensions 
for voice information access and navigation. These voice 
documents are linked using HTML hyper-links that are 
accessible to subscribers using voice commands, touch-tone 
inputs and other selection means. These voice documents 
and components in them are addressable using HTML 
anchors embedding HTML universal resource locators 
(URLs) rendering them universally accessible over the Inter- 
net, This collection of connected documents forms a voice 
web. The voice web includes subscriber-specific documents 
including speech training files for speaker dependent speech 
recognition, voice print files for authenticating the identity 
of a user and personal preference and attribute files for 
customizing other aspects of the system in accordance with 
a specific subscriber. 
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SYSTEM AND METHOD FOR PROVIDING 
AND USING UNIVERSALLY ACCESSIBLE 
VOICE AND SPEECH DATA FILES 

BACKGROUND OF THE INVENTION 

1. Field of the Inveation 

This invention relates generally to the construction and 
use of distributed interactive voice and speech processing 
systems, including interactive voice response (I^^) systems 
and voice messaging (VM) systems. More particularly, the 
invention relates to form based publishing of voice infor- 
mation and the use of universally accessible personal pro- 
files for authentication of the user by voice signatures and 
generating context sensitive active vocabularies to improve 
speaker dependent speech recognition. The invention also 
relates to the use of the user attributes and preferences stored 
in universally accessible personal profiles to improve the 
efiBciency of navigation and search as well as eflScacy of 
search results pertaining to user queries. 

2. Description of the Related Art 

Conventional interactive voice response (IVR) systems 
allow a user to place a telephone call into a system, navigate 
(generally using touch tone input) through a hierarchy of 
options in response to voice prompts and retrieve informa- 
tion stored in a computer database. Airlines, banks, credit 
companies and many other service organizations are just a 
few examples of the types of businesses using IVR systems 
to allow a customer (or prospective customer) to retrieve 
desired information. TTiese conventional systems are gener- 
ally organization-specific in that they offer access to a single 
database or set of databases related to the goods, services or 
other aspects of the organization maintaining the IVR sys- 
tem. Thus, conventional IVR technology is used to offer 
access to information specific to a single organization (i.e. a 
specific airline, bank or credit company). For example 
airlines typically use IVR to allow callers to access flight 
arrival and departure information or to select reservation 
options, for the particular airline only. 

It is desirable to provide an IVR system that enables 
access to an aggregation of databases and services rather 
than a single database and service. One barrier to the 
provision of aggregated services in an IVR system is that 
conventional IVR systems do not have a distributed infor- 
mation publishing means. Conventional IVR systems do not 
have a mechanism for service/information providers to 
readily access the IVR system and add updated or entirely 
new information for publication on the IVR system. 

Further, conventional IVR systems are generally config- 
ured for uniform access by any caller admitted to the IVR 
system. Each caller is handled by the system in the same 
manner and offered an identical set of options. One reason 
that IVR systems use uniform user interfaces for each caller 
rather than caller-specific configurations is that conventional 
IVR systems operate in "closed" computer environments 
hosting the particular IVR system. Thus, when a caller 
accesses a conventional IVR system, the only caller-specific 
information which the system has at its disposal, is any 
information previously provided by the caller which the 
system has maintained or any information that is provided 
by the caller during the IVR session (i.e. when a user enters 
an account number using touch tone telephone input). 
Because, however, collecting and storing caller-specific 
information with conventional technology is cumbersome 
and time consuming, most IVR systems do not offer caller- 
specific (caller customized) features. 

^rhere are numerous applications in which it is desirable 
for an IVR system to use caller-specific information in 
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handling a call. Caller-specific information in the form of 
user preferences can aid in minimizing the size of a com- 
mand tree which the user must navigate to access desired 
information. Additionally, caller specific information could 
also be used to authenticate the identity of a user in cases 
where security is an issue (i.e. in bank and credit contexts). 
Further, caller-specific speech training profiles could be used 
to implement speaker dependent speech recognition to allow 
for a caller to use voice commands in place of touch-tone 

) commands. Still further, an IVR system having access to 
caller-specific data could be used to apply IVR technology 
in new application areas such as personal productivity. 

Thus, there is a need for an improved voice and speech 
processing system that provides universal access to caller- 

5 specific information to provide user-customized IVR sys- 
tems. Further, there is a need to provide universal access to 
voice and speech files in order to allow widespread use of 
such files for caller authentication and for performing 
speaker dependent speech recognition in IVR systems. 

^ SUMMARY OF THE INVENTION 

The system and method of the present invention extends 
World Wide Web (referred to herein as "www" or the "web") 
and Internet technology to provide universally accessible 

5 caller-specific profiles that are accessed by one or more IVR 
systems. The invention features a set of web pages contain- 
ing information (components) formatted using MIME and 
hypertext markup language (HTML) standards with exten- 
sions for voice information access and navigation. These 

3 web pages are linked using HTML hyper-links that are 
accessible to users via voice commands and touch-tone 
inputs. These web pages and components in them are 
addressable using HTML anchors and links embedding 
HTML universal (uniform) resource locators (URLS) ren- 

5 dering them universally accessible over the Internet. This 
collection of connected web pages are referred to herein as 
the "voice web" and the individual pages are referred to 
herein as "voice web pages". Each web page in the voice 
web contains a specially tagged set of key words and touch 

0 lone sequences that are associated with embedded anchors 
and links used for navigation within the web. 

In addition, the invention features a set of linked HTML 
pages representing the tiser's "personal profile". The per- 
sonal profile contains user's attributes and preferences. 

5 Attributes include user's name, address, phone number, 
personal identification code, voice imprints for 
authentication, speech training profile and other informa- 
tion. Preferences include, configuration preferences such as 
personal greetings and gender and language selection, selec- 

0 tion preferences such as bookmarks and favorite places and 
presentation preferences such as priority ordering, default 
overrides and preferred vocabulary. 

The personal profile is designed for component access 
within web pages allowing easy extraction of context sen- 

5 silive profile information. In particular, speech training 
profiles (included as a user attribute and which contain word 
patterns representing speaker dependent training 
information) partitioned into sets of related words likely to 
occur in combination within corresponding voice web 

0 pages. A set of command and control words such as "play, 
pause, continue, previous, next, home, reload, help, etc." are 
stored in a top level component set enabling user dependent 
but context independent navigation and control. Other com- 
ponent sets are designed to match the key word sets in 

5 corresponding voice web pages such as a calendar page or 
an address book page enabling user and context dependent 
navigation and control. 
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When a user calls into the distributed voice and speech 
processing system associated with the voice web, the system 
first identifies the user utilizing a unique account number 
(such as phone number or social security number). Next, it 
accesses the user's personal profile using the corresponding 
URL and retrieves the user attributes and preferences related 
to authentication and security. Using this personal profile 
information, the voice web system authenticates the identity 
of the user using a combination of personal identification 
code based password checking and voice imprint matching. 
The voice imprint is any suflBciently long utterance or phrase 
that the user has previously entered into his/her profile. Each 
user's voice imprint is analyzed and stored in the profile for 
quick matching on demand with a real-time provided user 
sample. The combination of every individual's unique vocal 
characteristics stored in the voice imprint coupled with the 
random choice of the password phrase ensures a high degree 
of security and authentication. 

Once authenticated, the user is allowed to navigate and 
access more information from the voice web using voice 
commands. In order to effectively accomplish this task, the 
voice web system retrieves the context independent com- 
mand and control key word set from the user's speech 
profile. 

The voice web system then presents a top level voice web 
personal home page for user's perusal. At the same time, it 
retrieves the set of word recognition patterns associated with 
the key words in the presented page from the user's speech 
profile. Thus, the system is able to match the active vocabu- 
lary and associated speaker dependent word patterns 
dynamically in a context sensitive manner. The process 
continues as the user navigates from page to page. The voice 
web system dynamically retrieves the suitable subset of 
training word patterns from the user's speech profile match- 
ing the voice navigation key words in the page being 
presented to the user. 

The process described above greatly reduces the size of 
the training information that needs to be retrieved at any 
time while significantly enhancing accuracy of speech rec- 
ognition using speaker dependent training profiles. Since the 
speech profile is constructed using HTML pages and 
components, it is universally accessible using its URL. This 
enables the user to call into any compatible Internet con- 
nected voice web system in user's proximity from anywhere 
in the world, identify himself/herself to the system and then 
enable the system to dynamically retrieve suitable informa- 
tion that enhances his/her navigation and access of the 
information stored in the voice web using voice commands 
and input. 

In addition to the user attribute information discussed 
above, the personal profile contains user preferences relative- 
to configuration, presentation and information selection. 
These preferences are components within the personal pro- 
file pages and are easily available to the voice web system 
for dynamic retrieval. For example, if the user requests 
his/her stock portfolio from the voice web, it first retrieves 
the user's preferred portfolio of companies from his/her 
profile and applies this list to limit the search on stock quotes 
from all companies. The user gets exactly the information 
relevant to his/her interest in exactly the order of priority 
he/she prefers. 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a functional block diagram of a voice web 

system in accordance with the present invention. 

FIG. 2 A is a fiinctional block diagram of the voice web 

system shown in FIG. 1 configured to provide voice web 

services. 
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FIG. 2B is a functional block diagram of an exemplary 
calendar service. 

FIG. 2C is a functional block diagram of an alternative 
configuration of a voice web system in accordance with the 
5 present invention. 

FIG. 3 illustrates personal voice web used to provide 
personal services using the system shown in FIG. 2A. 

FIG. 4 illustrates a hierarchy of speech training pages that 
correspond to the service pages shown in FIG. 3. 

FIG. 5 illustrates a hierarchy of attributes and preferences 
pages that correspond to the service pages shown in FIG. 3. 

FIG. 6 is a flow diagram of a subscriber authentication 
method used in the delivery of the personal voice web 
35 services shown in FIG. 3. 

FIG. 7 is a flow diagram of an enhanced speech recog- 
nition processes used in personal voice web systems shown 
in FIG. 3. 

FIG. 8 is a flow diagram of a query customization process 
^° in accordance with the present invention. 

FIG. 9 is a flow diagram of a voice publishing method in 
accordance with the present invention. 

FIG. 10 is a system diagram of a busioess-yellow-order 
25 P^S^ system in accordance with the present invention. 

DESCRIPTION OF A PREFERRED 
EMBODIMENT 

The figuires depict a preferred embodiment of the present 
invention for purposes of illustration only. One skilled in the 
art wiU readily recognize from the following discussion that 
alternative embodiments of the structures and methods illus- 
trated herein may be employed without departing from the 
principles of the invention described herein. 

35 

System Description 

FIG. 1 is a functional block diagram of a voice web 
system 100 in accordance with the present invention. Voice 
^ web system 100 extends the conventional internet and world 
wide web ("web" or www) technology to voice and speech 
processing applications and also enables new uses for inter- 
active voice response (IVR) technology. Voice web system 

100 inr-liiHp^nnP prmgrp, ymrp Wfth sileS 102 COUpled tO OnC 

45 or moreiVt5Icewe^'gatewaysi05 via the Intemel 101. Voice 
web siteWi02and voice web g^way s 105 transfer files ov er 
Internet lOTin atCOtyans&jSwth^iypenex^ iranspon proioCoh 
Qn^^fyr^^^ShsertSoTlQl accesses the voice web system 
yiw by coupling to the gateway 105 using a telephone 111 
coupled to the public switched telephone network (PSTOl. 

^r%9. — 

Internet 101 is a system of linked communications net- 
works that facilitate communication among computers 
which are coupled to internet 101. Generally, interacts such 
55 as Internet 101 facihtate communication by providing file 
transfer, electronic mail and news group services. Internet 

101 is preferably the Internet which evolved from the 
ARPANET and which is publicly accessible world wide. It 
should be understood however, that the principles of the 

50 present invention apply to other internets and even closed 
(private) networks such as corporate intranets. 

It should be noted that system 100 may include numerous 
voice web sites 102 and numerous voice web gateways 105. 
A single voice web site 102 and a single voice web gateway 

65 105 are shown in FIG. 1, however, to keep the figure 
uncluttered. Thus, voice web system 100 is a collection of 
voice web gateways 105 and voice web sites 102 connected 
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voice web pages 103 and can execute associated external 
scripts or programs in accordance with the present inven- 
tion. These external scripts and programs interface with 
databases and other information sources bothiplrmal an^ 

gaiewa)[^05 is a computer connected to the 
gateway 10 5 also includes a con- 
ventional voic^lSecominunicationll&iterface 114 for cou- 
pling to the pub te ' swi tcheU tglephone network (PSTN) 109 
10 for telephonic communications with a subscriber 107. Tele- 
phone 111 is any voice enabling telecommunications device. 
Exemplary telephones include conventional desktop 
telephones, portable telephones, cellular telephon es, an dog 



over internet 101 enabling subscribers 107 to access voice 
web pages 103 via their telephones as shown in FIG. 1. 

A voice web page 103 is web page specified using a 
navigable maricup language that includes voice extensions. 
A navigable maricup language is an enhanced type of 
markup language that facilitates publication navigation and 
access of information stored in documents specified in the 
navigable markup language. An exemplary markup lan- 
guage is the Hypertext Markup Language 2.0, RFC1866, 
HTML working group of Internet Engineering Task Force,| 
Sep. 22, 1995, edited by D. Connolly published on the 
at the following uniform resource locator (URL) address: 
http://w3.org/pub/www/Markup/html-spec 

A markup language is a language that includes a set of 1 te lephoneM igital tel ephones, smart phon esaBdi& compute! 
conventions for marking portions of a document so that, P j-^OBggurgdTpi^^ a lele ftHAne. m-nT perform tplpfi 



when accessed by a parsing program such as a web browser, 
each marked portion is presented to a user with a distinctive 
format. In contrast to formatting codes used by word pro- 
cessing programs, markup language codes, called lags, do 
not specify exactly how the tagged portion should be pre- 
sented. Instead the tags inform the web browser (parser) that 
the information is in a certain portion of a document such as 
title, heading, form or text and the like. The web browser 
(parser) determines how to present the tagged information. 

A navigable markup language is an enhanced markup 
language that uses tags that are anchors and that are links. 
When these link and anchor tags are invoked, a user is then 
presented another navigable markup language document in 
accordance with the link and anchor tags. This link is 
sometimes called a hyperlink. A hyperlink is a reference to 
another markup language document which when invoked 
facilitates access of the referenced markup language docu- 
ment. 

A navigable markup language thus uses attributes, tags 
and values that enable (i) a publisher to specify the presen- 
tation of information to a user; (ii) a user to interactively 
access the stored information; and (iii) a user to access other 
navigable markup language documents using hyperlinks. 

The navigable markup language used to specify voice 
web pages 103 is Hyper Voice Markup Language (HVML). 
HVML is a version of HTML that includes voice extensions 
as described in Appendix A, incorporated herein by refer- 
ence. Voice web pages 103 include HVML tags and 
attributes that extend HTML to facilitate publication, navi- 
gation and access to voice information. For example, HVML 
specifies functions and protocols that facilitate voice and 
speech processing including voice authentication, speaker 
dependent speech recognition, voice information publishing 
(e.g. creating a voice form) and voice navigation. 

Just as conventional web documents are displayed for the 
user , voice web documents 103 are "played" to a subscriber 
over a telephone. A voice web page 103 is played (by voice 
web browser 106) by sequentially presenting the embedded 
voice components according to the HVML and MIME 
specifications. 

While a conventional web site enables on-demand access 
over an internet to conventional web pages, voice web site 

102 enables on demand access to voice web pages 103. 
Voice web site 102 is a computer that hosts voice web pages 

103 and serves them up to other computers (i.e. voice web 
gateway 105). More specifically, voice web server 102 is a 
computer configured with conventional web server software 
112 and which has access to stored voice web pages 103. A 
voice web site 104 additionally optionally includes a sub- 
scriber directory 104 that stores a list of registered system 
subscribers. Voice web site 102 stores, serves and manages 
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functions. Itius voice web pages 103 are universally acces- 
sible from any ordinary telephone 111. Alternatively, 
subscriber 107 may access voice web pages 103 either by 
using a subscriber interface local to voice web gateway 105 
(i.e. a direct user interface with voice web gateway 105) or 
by dialing into voice web gateway 105 using another com- 
uter such as a personal digital assist ant or a smart phon 
t^legffpimijjuPfttrnnR interface 114 serves as an 
interface between a voice web browser 106 and telephone 
111 and preferably includes conventional telephony and 
voice processing hardware and software enabling voice web 
gateway 105 to receive and answer telephone calls, respond 
to touch tone and voice commaads, route and conference 
calls, play voice prompts and record voice messages. 

Voice web gateway 105 additionally hosts a voice web 
browser 106. Voice web browser 106 is a computer program 
capable of accessing and processing voice web pages 103 in 
response to a request placed by subscriber 107. More 
specifically, voice web browser 106 (i) processes voice and 
touch tone activated subscriber commands, (ii) retrieves 
requested voice web pages 103 from the appropriate voice 
web site 102, (iii) interprets the embedded markup language 
(HVML) in the retrieved voice web page 103 and (iv) 
delivers the contents of a voice web page 103 to a subscriber 
107 over the telephone 111. In performing the above- 
mentioned processing, voice web browser 106 executes 
scripts, including "voice scripts" embedded in a voice web 
page 103. Voice web browser 106 provides a subscriber 107 
with fast, easy, convenient voice activated navigation and 
access to voice web pages 103. 

Voice web browser 106 is a conventional web browser 
modified with appropriate voice information playback and 
recording extensions and enhancements. Appendix A 
includes a specification of HVML and voice web browser 
commands and is incorporated herein by reference. 

Some voice web pages 103 contain references to scripts 
and programs that operate as service agents 110) to respond 
to subscriber requests as well as external events and carry 
out prescribed actions. These scripts and programs are 
externally stored on voice web sites 102 (for example as 
Common Gateway Interface (CGI) Scripts or Internet Ser- 
vices Application Programming Interface (ISAPI) 
programs). These external scripts and programs execute in 
the voice web server 102 environment as a service agent 
110. The external scripts and programs that comprise service 
agents 110 are referred to by URLs embedded in an asso- 
ciated voice web page 103. In the case of a voice web page 
103 that is a voice form, the script or program associated 
with the service agent executes in response to voice form 
submission by a subscriber 107. Service agents 110 follow 
standard Internet protocols such as HTTP, and conform to 
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conventional formats such as MIME and application pro- monitor, it can be navigated using the computer's mouse, 

gramming interfaces (APIs) such as CGI and IS API. keyword, and (with some additional plug- ins) microphone, 

and it can contain embedded anchors and hyper links to 

HVML Description other HTML pages, including other HVML pages. 

Conventional web pages are designed primarily for pre- ^ ^^i^ ^eb pages 103 are designed for three primary 

sentation on a computer color monitor and navigation by a purposes: (i) presenUng structured voice information to a 

mouse and key board. As such, graphics, images and text are ^^er; (ii) enabUng the user to navigate across and within 

the primary media types supported widely. Although, audio. ^^i*^ P^g^i capturing user input for informaUon 

video and 3-dimensional graphics extensions are becoming queries or submission, 

available, these extensions are directed primarily at com- a. HVML Presentation 

puter users and not telephone users. Presentation of voice information is accomplished prima- 

Voice web pages 103 consist of HTML pages that have rily by the voice tag. The voice tag has a type attribute which 

been extended with Hyper Voice Markup Language specifies the type of voice information to be presented. If the 

(HVML) for easy and effective navigation and access of ^5 ^Ype attribute has the file value, the voice information is 

voice information via a voice activated device such as an obtained from a voice file specified by its URL. If the type 

ordinary telephone. Voice web pages 103 retain aU the attribute has the text value, the voice information is synthe- 

properties and behavior of conventional HTML pages such sized from the specified text. If the type attribute has 

as HTML markup tags, universal identifiers (URLs), and number, ordinal, currency, date, or character value, then the 

hyper-Unks and can be accessed by a conventional web ^^^^ information is generated by concatenating voice frag- 

browser using HTTP protocols from a conventional web n^enls from a pre-recorded indexed system voice file. If the 

server. The additional markup tags are interpreted by an type attribute has the stream value, then the voice informa- 

H VML extended web browser to enable subscribers 107 to lion is obtained from the voice stream specified by its URL. 

navigate and access voice web pages 103 over the phone or Composition of several voice elements into a seamless voice 

similar voice activated device. Appendix A includes a speci- ^5 ^ accomplished by the voice-string tag. 

fication of HVML and voice web browser commands and is Combining these tags, publishers can compose and 

incorporated herein by reference. present: (i) pre-recorded voice prompts and messages; (ii) 

HVML pages web pages voice web page 103 are specially voice prompts generated using text-to-speech technology; 

designed for presentation using an ordinary telephone 111 ("0 Pre-formatted voice prompts with dynamic speech 

and navigation using touch tones and voice commands. TTiis 30 synthesis elements, 

is in contrast to conventional multimedia web pages that b. HVML Navigation 

may embed audio data to be presented on a multimedia Navigation of voice web pages 103 is primarily accom- 

personal computer using its speakers and navigated using its plished by extending the HTML anchor tag with new 

mouse, key board and microphone. Although, HVML voice attributes — tone and label. These attributes are used in 

web pages 103 can be embedded in generic multimedia web 35 conjunction with the existing href attribute in an anchor 

pages, thus sharing some of the information, they are element that makes the anchor into a hyper link. When the 

designed to be presented using an ordinary phone and user selects the touch tone signals specified by the value of 

navigated using commands generated by touch tone signals the tone attribute or utters the word specified by the label 

and speech recognition. attribute, the browser invokes the corresponding hyper link. 

An HVML web page (voice web page 103) is first and 40 tone and label attribute values must be unique within a 

foremost an HIML page. Each web page 103 has a unique Page- Navigation is also accomplished by system commands 

universal resource locator (URL) (also called uniform such as next, previous, reload, home, bookmarks, help, fax, 

resource locator). A URL is a string of characters that and history which are invoked by specific touch tone 

uniquely identifies an internet resoiu-ce including an identi- sequences or utterance of the words. Users can control the 

fication of (i) the access protocol to be used; (ii) an indica- 45 voice browser operations by issuing system commands such 

tion of resource type; and an identification of its location in as stop, start, play, pause, exit, backup, and forward. Using 

the computer network. For example, the following fictitious these attributes, publishers can enable (i) touch tone com- 

URL identifies a www document: http://www.voiscorp.com/ mand and control and link navigation; (ii) pre-defined, 

banner.gif uniquely identifies the location of a resource on system and user specific, spoken command and control key 

the worid wide web computer network, "http://** indicates 50 word recognition; and (iii) page and user specific spoken 

the access protocol, "www.voiscorp.com" is the domain command and control key word recognition, 

name of the computer on which the resource is located. c. HVML Fonns 

"banner" is the name of the resource located on the computer HVML uses the form tag to enable user input similar to 

specified by the domain name, "gif indicates that the banner HTML including the method attribute which specifies the 

resource is a gif (graphical interchange file) type resource. 55 way parameters are passed to the server and the action 

Similariy, the following fictitious URL uniquely identifies attribute which specifies the procedure to be invoked by the 

the location of a voice web page 103: http:// server to process the form. HVML extends the input tag 

www.voiscorp.com/voicememo.hvml. In this example, within forms by introducing voice-input tag. Voice-input 

"voicememo" is the name of the resource located on the lakes a type attribute similar to the input tag with three new 

computer specified by the domain name, "hvml" indicates 60 values "voice", "tone" and "review" in addition to the 

that the voicememo resource is an hvml type resource. Thus, existing "reset" and "submit" values. The HVML browser 

web pages 103 are each uniquely identified by their corre- pauses at each voice-input statement in a HVML form until 

spending URL. Once located, a web page 103 can be the specified input is supplied or input is terminated, before 

created, edited and played using existing web publication processing the remaining form. Using these tags and 

tools, it can be stored on any conventional web server 65 attributes, publishers can enable: (i) touch lone command 

anywhere on the Internet, it can be accessed by any con- and control and parameter input; (ii) pre-defined, user 

ventional web browser and presented on a computer specific, spoken alphabet and digit input; (iii) page and user 
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Specific, spoken key word and proper names input; and (iv) FIG. 2A is a functional block diagram of a voice web 

free form voice information input. system 200 configured to provide voice web services to a 

_ , ^ ^ . , , . . subscriber 107. Voice web system 200 includes one or more 

Operational Description of the Voice Web Browser ^^.^ ^^^^^^^ ^^^^^ ^ ^^^^ ^^.^ ^^^^ 

Syntactic and structural intelligence, such as in-line pre- 5 202 via internet 101. Service site 200 is a voice web site 102 

recorded voice prompts, pre -formatted voice prompts with configured to provide voice web services. Each voice web 

dynamically generated voice elements, key word accessible service is implemented using a collection of service agents 

anchor elements, voice responsive hyper links etc. are 201 and service pages 203 centered around a service data- 

embedded in voice web pages 103 through voice access base 202. Additionally, service site 200 optionally includes 

extensions to HTML. Behavioral intelligence including 10 a personal profile 204 to be used to the extent that the service 

command interpretation, page access, file caching, HVML being provided requires pre-stored subscriber-specific infor- 

interpretation and user interaction is embedded voice web mation (i.e. pre -stored information personal to the particular 

browser 106 (the HVML browser). Voice web browser 106 subscriber). 

has the following states: (i) waiting for user commands; (ii) y^^^ service agents 201 are a type of service agent 

active accessing and playing HVML pages; and (iii) paused is no (shown in FIG. 1) that execute on service site 102 to 

for user input. provide voice web services to a subscriber 107. Voice web 

Initially, voice web browser 106 is launched upon the service agents 201 are therefore scripts and programs rep- 

system*s receipt of a subscriber's telephone call. Once resented by a web page 103 (show in FIG. 1). 

launched, voice web browser 106 goes through an initial- g^^j^^ database 202 is a database of service information, 

ization sequence that mcludes subscriber authemication and 20 ^^^^^^^ ^^^^^ information varies with the type 

normaUy becomes "active" accessmg and playmg the sub- of service being provided. For example, if voice web system 

scriber's home page. Once the home page is played, voice configured to deUver a business white page service, 

web browser 106 "waits" for subscriber commands. As part ^^^^ ^^^^^ database 202 is a database of address and phone 

of playing the page, the browser may "pause" for subscriber number listings for businesses. If voice web system 100 is 

input and continue once the mput is provided. 25 additionally or alternatively configured to deliver news 

Independent of any specific voice web page 103 that a headlines, then voice web system 100 includes a service 

subscriber may be accessing, voice web browser 106 pro- database 202 that includes current news headlines, 

vides a set of navigational and operational commands. Service forms and pages 203 are voice web pages 103 that 

Within the telephone key pad, and are special keys HVML templates (voice forms and pages) that are "filled 

that generate umque tones. Voice web browser 106 has 30 ^^^^^^ ^ specific subscriber request. Service 

special meaning for these keys. In general, the key ^^^^^ 203 are used to gather subscriber input, to 

followed by a sequence of touch tones, excludmg the ^ ^^^-^^^ information and to deliver (publish) information to 

key. signals a browser command, an escape or a skip and the ^ subscriber. Some service pages 203 are database entry and 

"#" key signals a link activation, termmation of form mput, administration forms, some are database query forms and 

termmation of a key sequence or a selection. 35 ^^^^^^ database response pages. Entry forms are used to 

Voice Web Services add information to the database. Query forms are used to 

Voice web system 100 can be used to provide voice web ^^'^^^ information firom the database. Response pages are 

services to a subscriber 107. A voice web service is a service ^^ed to present retneved mformation to the user. In the 

that provides on-Une telephone based access to information. 40 r>^^^'^^ embodiment, service agents dynamically genera e 

Hie mformation is presented to the user through the publi- f P'^^l^'^'Tn^^ - "T"^ f^'^'^Tf . 

cation of voice web pages 103. The information presented to fr^^ ^ejvice database 202 and usmg the retrieved data m 

(published for) the subscriber may be information retrieved P^^ l^°o''T variables stored m an HVML tem- 

from a single information source or a combination of P^^^^- HVML templates link to each other specifying 

information sources including pubUcly accessible on-line 45 request-response dependencies. Tlius, subscribers 107 are 

databases, information proprietary to voice web system 100, ^^le to enter and retrieve inforination m persona and 

information previously stored by subscriber 107 or another external databases over internet 101 usmg web protocols 

informaton source. Exemplary services provided by voice without having to CTcate a voice web page for each entry m 

web system 100 include (i) personal information services service database 202. 

such as calendar, address book, electronic mail, voice mail, 50 Service agent 201 typicaUy uses a service database 202 

(ii) information services such as headline news, weather and a set of service pages and forms 203 to provide the 

reports, sports score, stock portfolio quotes, business white corresponding voice web service. The service database 202 

pages, yellow pages, classified information and (iii) trans- hosts the information that subscribers 107 wish to access, 

action services (commerce services) such as banking, bill The service forms allow subscribers 107 to input and query 

payments, stock trading, airiine hotel and restaurant rescr- 55 information in service database 202. Service pages allow 

vations and catalog store orders. service agents 201 to present the requested information to 

Users gain access to voice web services by becoming the subscriber 107 using voice web browser 106. 

voice web subscribers 107. Subscribers 107 preferably sign FIG. 2B is a functional block diagram of an exemplary 

up (e.g. register) for services through a service provider. In calendar service. The calendar service agent 210 uses the 

one embodiment, each subscriber 107 is assigned a unique 60 calendar database 211 together with the calendar and 

account number on a calling card and subscribers 107 access appointment details input and query voice web forms 212 

the voice web system 100 by dialing a single "800" (e.g. toll and appointment list and details voice web pages 213. 

free) service phone number and by then supplying their Subscribers fill in the calendar and appointment details input 

account number via the telephone 111. In an alternative voice web forms 212 to set their calendar appointments and 

embodiment, the services are publicly available and any user 65 their details. The calendar service agent 210 processes the 

placing a call into the system is processed as a subscriber submitted form and updates the calendar service database 

107 without requiring any registration. 211. Later, subscribers can retrieve their appointments for 
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any day by supplying 214 the month, date and year for that 
day in the calendar query voice web form 212. The calendar 
service agent 210 processes the submitted form, retrieves the 
matching appointments from the calendar database, and 
dynamically composes and returns the appointment list 
voice web page 213. if the subscriber requests for the details 
of any appointment, the calendar service agent 210 dynami- 
cally generates and supplies the corresponding appointment 
details page 213. 

The Personal Voice Web 

FIG. 3 shows a personal voice web 300 in accordance 
with the present invention. Personal voice web 300 is 
standardized collection of linked voice web pages and voice 
web forms (a special type of voice web page) that form a 
personal service space for the subscriber. Preferably, all 
subscribers share a common structure of linked voice web 
pages although the contents of personal voice web pages 
vary from subscriber to subscriber. Because each subscriber 
of the personal voice web system 300 has the linked page 
structure shown in FIG. 3, subscribers navigate about and 
access information from their personal voice web 300 in a 
standardized way. Each page in personal voice web 300 
includes an agent that performs various processing tasks 
required for each respective page. At the root of personal 
voice web 300 is the personal home page 301. Personal 
home page 301 Hnks to a personal profile page 302, a 
personal administrative assistant page 303, a personal help- 
desk page 304, and a personal commerce page 305. 

The personal administrative assistant page 303 is linked to 
a number of personalized voice web services (service pages) 
330 including, by way of an example, a calendar and 
appointments page 309, an address book page 310, a stock 
portfolio page 311, a news headlines page 312, a mail box 
page 313, and a business white pages home page 314. 

Calendar and appointments page 309 is used to provide an 
appointments service. The appointments service enables a 
subscriber to track personal and business appointments in a 
voice-based calendar. The subscriber thus adds and retrieves 
appointments over the phone using personal voice web 300. 
In addition to providing day and time information related to 
stored appointments, a subscriber may also store voice note 
annotations that is associated with a particular appointment. 

Address book page 310 is used to provide an address 
service. The address service enables a subscriber to add and 
retrieve address, phone number, and other information 
related to individual names or company names. The infor- 
mation added and retrieved is stored in a address book 
service database private to the subscriber. 

Stock portfolio page 311 is used to provide a stock quote 
service. The stock service enables a subscriber to retrieve 
current stock pricing and portfolio valuation information as 
well as statistical information related to changes in portfolio 
or stock positions. The stock service uses information 
retrieved from a stock portfolio service database private to 
the subscriber and additionally retrieves current stock pric- 
ing information from an on-line data-base or information 
source. 

News headlines page 312 is usedenables ide a news 
service. The news service enables a subscriber to retrieve 
news headlines related to subscriber customized topics. 

Mail box page 313 is used to provide a mailbox service. 
The mailbox service enables a subscriber to access elec- 
tronic mail (e-mail) messages. The e-mail messages are 
played for the subscriber using text to speech conversion and 
a speech synthesizer. 
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Business white pages home page 314 is used to provide a 
white page service. The white page service enables a sub- 
scriber to enter partial company name, and optionally city 
name and state code to retrieve the company's full name, 
5 address and phone number. 

Each service page 309-314 is part of a collection of voice 
forms and pages that are used by the corresponding service 
agent to retrieve a request from the subscriber, generate an 
appropriate database query responsive to the subscribe r- 
request, retrieve subscriber- requested information, and gen- 
erate a voice web page that incorporates the retrieved 
information and that is adapted for presentation 
(publication) to the subscriber using a voice web browser. 
Thus, for example the service agent associated with calendar 
and appointments page 309 generates a voice form for 
prompting a subscriber for month, day and year information. 
After receiving the prompted information, calendar and 
appointments service agent generates the appropriate query 
to extract the requested calendar information from a calen- 
dar service database. Once the calendar information is 
retrieved from the database, the calendar and appointments 
service agent generates a voice web page that includes the 
retrieved information. The new page is then presented 
(published) to the subscriber over the telephone by the voice 
web browser. 

Each of the other personal service agents associated with 
personal service pages 308-327 operate in a similar way to 
provide a subscriber with information retrieved from asso- 
ciated service databases. 

Personal helpdesk page 304 is linked to personal voice 
web helpdesk service pages 331 including, by way of 
example, a hotels page 315, an airlines page 316, a rental 
cars page 317, a travel agents page 318, a restaurants page 
319, a financial services page 320, and a banks page 321. 

25 The personal helpdesk page has an associated personal 
helpdesk agent that is used to provide a set of helpdesk 
services. Helpdesk services enable a subscriber to access 
product, pricing, availability and other information of the 
corresponding services. 

4Q Hotels page 315 is used to provide a hotel reservation 
service. Airlines page 316 is used to provide an airline 
booking service. Rental cars page 317 is used to provide a 
rental car reservation service. Travel agents page 318 is used 
to provide a travel service. Restaurants page 319 is used to 

45 provide a menu and reservations service. Financial services 
page 320 is used to provide a financial service. Bank page 
321 is used to provide a bank service. 

Personal commerce page 305 is linked to personal voice 
web commerce service pages 332 including, by way of 

50 example, an apparel shops page 322, a luggage stores page 
323, a gift shops page 324, a flower shops page 325, an ofiBce 
supplies stores page 326, and a book stores page 327. The 
personal commerce page provides commerce services that 
enables a subscriber to access catalogs associated with 

55 various retail establishments. As part of the commerce 
service, the personal voice web allows a subscriber to shop 
in varioiis catalogs and then submit orders for selected items 
directly to the sponsor of the associated catalog. Orders are 
submitted to the catalog sponsor either as a voice web form 

50 or conventional web form sent to the sponsor, as an elec- 
tronic message or using another means. 

Personal profile page 302 links to a set of personalized 
voice web profile pages including an authentication page 
306, a speech profile page 307, and an attributes and 

65 preferences page 308. 

User authentication page 306 contains authenticating 
information including a subscriber account number, an 
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encrypted password or personal identification number and The collection of profile pages for a single user constimte 

links to a voice authentication signature MIME resource. that user's personal voice web profile 300. Personal Voice 

Speech profile page 307 is linked to a hierarchy of speech ^eb profile 300 need not be a collection of static HVML 

training pages that correspond to the hierarchy of personal P^f ^ """^ P^^f)' "^ead be generated dynami- 

voice web 300. FIG. 4 shows the hierarchy 4^ of speech 5 cally usmg user profile page databases. However, once 

, . . AM A-yn o u . • • AM A-in genera led, these profile pages can be reused from various 

training pages 4Ul-»z ^peecn tramrng pages ^l-^z / are ^- 

voice web system without havine 

sets of pre-captured trainmg files to be used in performing ^^^^-^^^ ^^^^ ^^^-^ ^^- ^^^ databases thus saving 

speaker dependent speech recogmtion tn providmg the cor- significant time and resources, 

responding service to a subscriber. Each speech training ^ ^^^^^j ^^-^ ^^-^ ^ 

page IS thus accessed by the corresponding agent m per- 30 ^^^^ ^^^^ ^^^j^ ^^^^^^^ subscriber 

forming the correspondmg service. For example, the admm- J^.^^ ^^^^^^ ^^^^^^^ preferences, speech train- 

istrative assistant service accesses administrative ^eech ^^^^ information from the corresponding 

training set 431 (including speech training pages 409^14) ^^^^^ j ^^-^ ^^^.^^ 

mhelpdesk service accesses the helpdesk^^^^^ ^^^^^^^^ subscriber and service specific 

432 (mcluding speech traimng pages 415-421). The com- 35 ^^^^^^^^ personalizing the voice web service forms 

merce service accesses the commen^ traimng page set 433 ^ ^ enhancing and improving speech 

(mcludmg speech traimng pages 422^27). recognhion by embedding the speech training profiles in the 

Each speech training page 401-427 includes traming data corresponding voice web forms and pages, 

specifically tailored to the words more commonly associated Referring back to FIG. 2B, for example, the calendar 

with the corresponding service. For example, the calendar ^^^j^ ^^^^^ 2IO uses a corresponding calendar service 

speech traimng page 409 includes tramrng vocabulary to aid ^^^^^ ^^^^^ 215 to retrieve subscriber specific calendar 

m the recognition of voice commands such as "Tenth", attributes and preferences included in profile database 216 

"November", "Tuesday" and so forth. specifying the subscriber's calendar attributes and pref- 

Referring now again to FIG. 3, personal attributes and ^5 erences profile URL as part of a profile request web form, 

preferences page 308 includes subscriber attribute informa- Calendar service profile agent 215 responds to the submitted 

tion including name, account number, address, voice tele- web form, retrieves the requested subscriber information 

phone number, fax telephone number, paging telephone frona the calendar service profile database 216 and delivers 

number, encrypted credit card numbers and the like as well it to calendar service agent 210 as a table formatted web 

as personal preference information such as configuration, page. Calendar service agent 210 retrieves the requested 

selection and presentation preferences. Personal attributes information from the table format in the web page and uses 

and preferences page 308 is also linked to hierarchy of the subscriber's attributes and preferences to customize the 

attribute and preferences pages (shown in FIG. 5) that voice web service form and page templates 213 before 

correspond to the hierarchy of personal voice web 300. presenting them to the subscriber. In this way, the subscriber 

FIG. 5 shows the hierarchy of attributes and preferences 35 can have a personalized form or page presented to him/her 

pages 501-527 associated with personal attributes and pref- without having to supply information about himsel^erself 

erences page 308. Attributes and preferences pages 501-527 repeatedly in each call. 

are pages that store subscriber-specific preference informa- Similarly, calendar service agent 210 uses a correspond- 

tion to be used in providing the corresponding service to a ing calendar service profile agent 215 to retrieve subscriber 

subscriber. Each attributes and preferences pages 501-527 is ^ specific calendar speech training profiles from profile data- 

thus accessed by the corresponding agent in performing the base 216 by specifying the subscriber's calendar speech 

corresponding service. For example, the administrative training profile URL as part of a profile request web form, 

assistant service accesses attributes and preferences set 531 Calendar service profile agent 215 responds to the submitted 

(including attributes and preferences pages 509-514). The web form retrieves the requested subscriber information 

helpdesk service accesses the helpdesk attributes and pref- 45 from the calendar service profile database 216 and delivers 

erences set 532 (including attributes and preferences pages it to the calendar service agent 210 as a table formatted web 

514-521). The commerce service accesses the commerce page. The calendar service agent 210 retrieves the requested 

training page set 543 (including attributes and preferences information from the table format in the web page and 

pages 522-527). embeds the subscriber's speech training profiles in the voice 

It should be noted that the user profile information for 50 web form and page templates (pages 212,213) before deliv- 

multiple subscribers is stored in user profile databases. The ering them to the voice web browser. The voice web browser 

user profile databases are accessed by service dependent uses these speech training profiles to dynamically change the 

profile agents. For example, personal identification and active vocabulary in the voice processing software and 

verification information of multiple subscribers is stored in hardware thereby customizing it to the subscriber, 

a user profile home page database (a service database) and 55 FIG. 2C is a functional block diagram of an alternative 

accessed by the subscriber's profile home page agent. Cal- configuration of a voice web system in accordance with the 

endar attributes and preferences information for multiple present invention. The system includes a computer config- 

subscribers is stored in the subscriber calendar attributes and ures as a combined voice gateway and voice web site 

preferences profile database (a service database). Calendar (combined site) 220. Combined site 220 includes gateway 

service specific speech training information for multiple 60 components such as a voice and telephony interface 114, a 

subscribers is stored in the subscriber calendar speech voice web browser 106 and server software 112. Combined 

training profile database (a service database). Calendar ser- site 220 additionally includes voice web site components 

vice profile agent responds to HTTP form requests for such as service agents 201, service database 202 and service 

calendar attributes and preferences or calendar speech train- forms and pages 203. Combined web site 220 provides voice 

ing profile page information for any particular subscriber 65 web access to a subscriber 107 coupling the combined site 

and supplies the appropriate subscriber profile page infor- 220 via the PSTN 109. Because the voice gateway and voice 

malion as HVML voice web pages. web site functions are combined within a single computer 
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environmenl, the server software 112 (located in combined directory. The login agent additionally verifies the PIN 

site 220) and the voice web browser 106 exchange files which was submitted. Upon verification of the PIN, the login 

without suffering the delays imposed by routing across the agent presents 603 the subscriber's voice authentication 

Internet 101. In certain applications, for example when a form to the subscriber over the telephone. As part of the 

subscriber is accessing personal databases this configuration 5 presentation, the login agent requests the subscriber to 

is advantageous to improve system performance. It should supply a personalized voice authentication sample. The 

be noted, however, that even though server software 112 login agent then wails 604 for the subscriber to supply the 

(located on combined site 220) and voice web browser 106 sample and submit 605 the form. After the subscriber 

exchange files using a local interface as opposed to Internet submits 604 the form, the login agent processes 606 the 

101, they nonetheless exchange files in accordance with lO submitted form. During processing 606 of the submitted 

HTTP. form, the login agent accesses the subscriber's personal 

Voice web browser 106 communicates with other web authentication page from the subscriber's personal voice 

sites (such as web sites 224 and 225) using Internet 101. web profile (linked to the subscriber's home page) and 

Web site 224 is a computer coupled to Internet 101 config- attempts to retrieve the voice authentication signature. If this 

ured with server software U2, service agents 201, service ^5 is the first time the subscriber is accessing the service, the 

database 202 and service forms and pages 203. Web site 224 signature will be missing from the subscriber's authentica- 

is configured to deUver voice web services as described in tion page. In this case, the login agent presents 607 the 

reference to FIGS. 2 A and 2B, authentication signature creation form to the subscriber. 

«7 1- •* • * a ^ %u Using the options presented in the signature creation form, 

Web site 225 is a computer configured with server soft- . . t . . * a v *u 

tt-y • V^i *v,™<. 20 the subscnber selects the option to create or modify the 

ware 112, a profile service agent 223, service tonus and , ■ . t- n • .u 

-t-y-y A ci J ♦ u 1-^1 u ^ -y-ye v „ personal voice authentication signature. Following the 

pages 222 and profile database 221. Web site 225 is a ^ f ^ . j j u «i. i • * u -wT 

univer<;allv accessible orofile web site that is accessed bv instructions provided by the logm agent, the subscnber fills 

universaUy accessible protUe web site that is accessed by authentication signature creation form and 

any other web site or web gateway m the voice web system . ,• . • . 4l 

\ • u u * u *u records a personahzed voice phrase as an authentication 

as lone as the accessing web site or web gateway has the . ^ f_ . ^^0*1. • * r 

• * fiDT • f MA.u r ^-yk „e«r 25 signature. After filhng in 608 the signature creation form, the 

appropriate URL mformation. Web site 225 provides user .i_ . »l r * 1 . t-u i • 

ci ■ r u* * r u ■ i subscnber submits the form to the login agent. The login 

profile information to web site agents (such as service agents ^ . .1 • * • f • 1 a 

^ .A .u u i / u u i -iiJio^^ agent waits until the signature creation form is submitted 

201) located on other web sites (such as web site 224 and » , • icm «u a^a^u.^^^ 

u- A % -t-ytw A A * 1 u ^-.^ 609. The login agent then processes 610 the recorded phrase 

combined site 220). Advantageously, any web site and/or . % • . - , ^ r 1 .iL 

, ^ \. ^ . - / , A • converting it mto a signature pattern and hnkmg it to the user 

web gateway can thus access information stored m the . . ^ . ^ **tx*c 

^..fiii hmI.c 71 K« u^.rUM.. tn thP w.h ^.a.. ^0 authentication page as a MIME resource for future verifi- 
cation. 

If however, after processing 606, the login agent deter- 
User Authentication and verification mines that there is an authentication signature stored in the 
Personal voice web system 300 uses a login agent as a „ subscriber's personal profile then the login agent perform a 
gatekeeper to the access of each subscriber's personal voice to determme whether there is a match between the 
web. The login agent is a distributed software program that ''°'^<\ authent.catiOD signa tare and the voice sample sub- 
can receive Subscriber information over a telephone, access the subscriber. If test 6U determmes tha there .s 

1 ci e .ul -..u,„_-i.«,>, a match between the sample and the signature, then the 

the subscnber s personal profile pages from the subscnber s ..... . .•. ■ • u j.u 

, , J f ,u K -K..,', .^.^f.-.ic subscriber is given access to the personal voice web and the 

personal voice web and verify the subscnber s credentials .„ . , ™^ , . .u .• .• 

over the tele hone voice web. Test 611 uses conventional voice authentication 

over 6 e ep . methods. A "match" is determined by test 611 when the 

Each system subscnber is given 0) an account number (ii) conventional voice authentication method determines that 

a personal identificaUon number (PIN) and (ui) a service ^j^^ speaker's voice print or voice signature matches a master 

calling number. In order to access a pei^nal voice web. the ^^^^^ ^^j^^ ^^.^^ ^ ^^^^^^ ^^^^ ^ ig^j 

subscriber calls the service caUmg number and uses account ,oi„^c6. If, however, the test determines that there is not a 

information and the PIN to imtiate a subscnber authentica- ^^^^j^ ^^^^^^^ signature, then the 

tion process. FIG. 6 .s a flow diagram of a subscriber .^bscriber is denied access 613. 
authentication method 600 in accordance with the present 

invention. The subscriber authentication method 600 Enhanced Speech Recognition 

includes authentication signature creation form processing jq 

and subscriber authentication processing. Automatic speech recognition falls into three categories: 
A subscriber initiates access 601 of his or her personal speaker dependent, speaker adaptive, and speaker indepen- 
voice web 300 by calUng the service calling number using dent. A speaker dependent system is developed to work for 
a conventional telephone or a similar voice activated device a single speaker and are usually easier to develop, cheaper 
computer configured to access the pubUc telephone network. 55 more accurate but requu-es the use of user- 
After the subscriber initiates access 601, a login agent starts specific speech trainmg files. 

login processing 602. The size of the vocabulary of a speech recognition system 

During login processing 602, the login agent answers the affects the complexity, processing requiremenU and the 

call and presents a standard login fonn to the subscriber. A accuracy of the system. Referring now again to HG. 3, 

login form is a voice form for collecting and submitting 60 personal voice web 300 uses small to medium sized vocabu- 

login information including subscriber account number and lanes (ten to hundred of words). 

the subscriber PIN. After a subscriber enters the login An isolated-word or discrete speech system operates on 

information (into the login form) and submits the login form, single words at a time requiring a pause between each word 

the login agent uses the login information to retrieve the utterance. This conventional type of speech recognition is a 

URLof the subscriber's personal voice web home page 301. 65 simple form of recognition to perform because the end 

The login agent retrieves the URL by looking up the points are easier to find and the pronunciation of a word 

subscriber's account number in the voice web subscriber tends not to afifect others. As the occurrences of the words 
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arc more consistent and sharply delimited they are easier to loads the personal voice web profile page 302 and the speech 

recognize. Personal voice web 300 focuses on discrete profile page 501 containing the command and control 

speech and in particular on speech used for command and vocabulary for the home page. Th is vocabulary includes the 

control. basic voice web browser command and control as well as 

Personal voice web 300 typicaUy uses speech coded at 8 5 home page specific command and control. From the home 

kHz using 8 bit samples resulting in 64 kbps bandwidth and P^ge, the subscriber requests a particular service (i.e. per- 

storage. Conventional adaptive pulse code modulation so^^l administrative assistant, the personal helpdesk or the 

(ADPCM) techniques can reduce the bandwidth to 16 kbps personal catalog store). The home page agent determmes 

without loss of information. ^03 what service the subscriber has selected and in response, 

„ , . . ^/wi *• 1 I J 10 invokes 704 the selected service and then proceeds to deliver 

Personal voice web 300 uses conventional speaker depen- .f.. . . ^ r^, ■„„ ■„„^^^.:^„ ta^ tu^ ««™ u^tu 

, ^ f J- . u T,.. ^ 1 705 the service. During mvocation 704 of the service, twtn 

dent recognilion of discrete speech. This coavenUona ^^.^^ ^ 

speaker dependent recognit.on rehes on digUal sampl.ng of J ^^ ^^^^^^ 

the word utterances. After samphng. the next sUge is ^^^^^ ^^.J^^^^ ^^^^^^ ,^ ^J^^^ 

acoustic signal processing. Most techniques include spectral • j- u 

analysis. This is foUowed by recogDition of phonemes, " ^^Z'"^. "uprove speech recognition. 

groups of phonemes and words. This stage uses many Dunng dehvery 705 of the selected service the service 

conventional processes such as Dynamic Time Warping. ^gf °' l^e speech irainmg page associated with he 

Hidden Markov Modeling. Neural Networks, expert systems f !° ""f"^"!^* 

and combination of techniques. Hidden Markov ModeUng ^20 by the subscnber. Specifically, the service agent obtains 

based techniques are commonly used and generaUy the most ^° ^P!l<='' ''""'"'8 P'i°'"^' '"f'^, " s^-^'^* fg* ^ 

successful approach. Additionally, personal voice web 300 " ^IME resource and forwards it to the voice web browser 

uses some knowledge of the language to aid the recognition ^ '"">"'S '.° reco^Uion. 

^ Thus, responding to the subscnber's voice commands per- 

P „ ■ . ^ . , J J . tinent to the accessed voice web service page, the voice web 

Personal voice web 300 improves speaker dependen ^^^^^ recognizes the command and control word utter- 

recognition of discrete speech in a commarid and control ^^^^ subscriber's voice commands that are submitted 

context usmg umversaUy accessible pereonal speech tram- ^jO) and matches them against the personaUzed vocabulary 

ing profiles 401-427. As descnbed above, the personal ^ ^^^^ corresponding voice web speech training page for 

speech trammg pages 401-427 are organized ^ a Unked ^^^^^ ^^^^ dependent recognition of discrete speech. 

collection of voice web profile pages each linked to the ,r .u u u . . = 

, V C • TT. ,1, 30 If the subscnber requests access to a new service page 

corresponding personal voice web service page_ Thus, the ^.^^^ ^ currently Accessible service page, the currently 

personal speech trammg profile pages paraUel theper«,nal ^^.^ agent exits 706 the current iervice and then 

voice web service pages in Structure as shown mrlOb. 3 and , -^^ .1. *j • t\ - *u «p 

, „ . . . • At\^ A'^'f * • * • • invokes 704 the requested service. Dunng the invocation of 

5. Each speech trainmg page 401-427 contains the traimng , i-.i. . ^ • u 

* , , ^ ^ ^ . , . ♦ • . t the requested service, the requested voice web service page 

vocabulary for browser command and control that IS context j - * .1. ?j • • 1 

, 35 corresponding to the requested service is loaded as well as 

epen ent. , ^ , . the corresponding speech training page containing the 
Each service page 301-327 hnked to the personal voice njatchiag command and control vocabulary. In this process 
web home page 401 has a corresponding speech traimng ^^^-^^ ^^-^^ ^^^^^ ^j^^yg ^1^^ ^^^^ appro- 
page 402-427. The personal voice web 300 is cons^cted in ^^^^^ vocabulary for the existing context thereby greatly 
such a way that each voice web service page 302-327 links reducing the size of the active vocabulary that needs be 
to its corresponding speech training page 401^27 usmg its accessed while significantly improving the speaker depen- 
URL. As the subscriber navigates from service page to ^^^^ recognition, 
service page in the personal voice web 300, the system is 

able to access the corresponding speech training page using Query localization and customization 

its embedded URL. 45 Query customization uses stored subscriber attributes and 

Each speech training page 401-427 contains a set of preferences to customize queries of service databases. Query 

command and control key words and their personalized customization is accomplished by maintaining user 

speech recognition patterns representing the context sensi- attributes and preferences in a collection of voice web pages 

tive vocabulary for the corresponding service page. For 501-527 (described above in reference to FIG. 5) that 

example, the calendar and appointments service page 309 is 50 parallel the corresponding voice web service pages 301-327 

linked to a corresponding speech training page 409 contain- (described above in reference to FIG. 6) and using the 

ing key words and recognition patterns for "year", "month", attribute and preferences information corresponding to the 

"day", the names of the months and days, digits representing service requested to customize the query parameters within 

dates and times etc. Similarly, stock portfolio page 311 is forms. 

linked to a corresponding speech traimng page 411 contain- 55 Referring now again to FIG. 5, the attributes and prefer- 

ing key words and recognition patterns for "stock", "quote", ences pages 501-527 parallel the personal voice web service 

"volume", "option", "symbol", names of companies in the pages 301-327 in structure as shown in FIG. 3. Each service 

portfolio etc. page linked to the personal voice web home page 301 has a 

FIG. 7 is a flow diagram of a speech recognition process corresponding voice web attributes and preferences page 

700 in accordance with the present invention. The process is 60 linked to it. The personal voice web 300 is constructed in 

initiated after a subscriber has gained access 701 to the such a way that each voice web service page 301-327 links 

personal voice web in accordance with the process described to its corresponding voice web attributes and preferences 

in reference to FIG. 6. Once the subscriber gains access to page 501-527 using its URL. As the subscriber navigates 

the personal voice web 701, the login agent accesses the from service page to service page in the personal voice web 

subscriber'spersonal voice web home page and presents 702 65 300, the system is able to access the corresponding voice 

the home page to the subscriber over the phone. During the web attributes and preferences page using its embedded 

process of presenting 702 the home page, the login agent URL. 
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A subscriber of voice web services requests inform atioo editing tools available on personal computers and worksta- 

by accessing a voice web service page and having it played tions. Alternatively, voice web agents can dynamically com- 

by the corresponding agent (i.e. administrative assistant, pose voice web pages and forms based on user requests and 

helpdesk or commerce agent). The subscriber requests ser- optionally profiles as well as accessed databases and ser- 
vice through submitting a query form presented by the 5 vices. Advantageously, dynamic form-based publication 

corresponding agent. The query form is an HVML form for enables information and service providers to publish voice 

touch tone and voice data input. When a service is requested web pages using the conventional telephone without the 

by the subscriber, the agent retrieves the corresponding need for any additional computer based voice web publish- 

voice web attributes and preferences page and automatically ing tools. Dynamic form-based publication is achieved by 
fills the query form with appropriate default parameters 10 combining voice web publishing forms, voice web publish- 

obtained from the subscriber's attributes and preferences. ing agents and voice web page publishing templates. 

For example if the subscriber is accessing the weather FIG. 9 is a flow diagram of a voice publishing method in 

service page, the agent fills in the subscriber's home town accordance with the present invention. The method presents 

and other chosen cities automatically from the subscriber's 901 a voice web form to a caller calling into a voice web 
attributes and preferences page. Similarly, if the subscriber is system using a conventional telephone. Voice web publish- 

is accessing the stock portfolio service page, the agent ing forms are specially designed voice web forms that when 

accesses the corresponding attributes and preferences page interpreted (i.e. when played back) using the voice browser 

and fiUs in the subscriber's chosen portfoho of stocks in the prompt the caller (the voice information publishers) to input 

query form. Id addition, the agent also automatically fills in voice and touch tone based input using a telephone. The 
the appropriate subscriber attributes such as his/her access 20 forms guide the caller step by step to supply the needed 

account number, password etc., thereby easing the subscrib- information, edit and modify the information and finally 

er's access while exploiting the availability services through submit 903 the information for processing 902. 

web based queries. Voice web publishing agents process 902 the filled voice 

FIG. 8 is a flow diagram of a query customization process web publishing forms extracting and separating voice infor- 
800 in accordance with the present invention. The process is 25 mation and touch tone input. Based on the touch tone inputs, 

initiated after a subscriber has gained access 801 to the the agents may present additional publishing fonns to the 

personal voice web in accordance with the process described caller (publisher). The voice information is stored 904 in 

in reference to FIG. 6. Once the subscriber gains access 801 voice files and linked to the corresponding voice web page 

to the personal voice web, the login agent accesses the publishing template by substituting variables within the 
subscriber's personal voice web home page and presents 802 ^o page template with the generated files. The touch tone input 

the home page to the subscriber over the phone. is used whenever the caller (publisher) needs to input 

During the process of presenting 802 the home page, the alphanumeric information that can be processed by the 

login agent loads the attributes and preferences page 501 publishing agent. 

from the subscriber's voice web personal profile. Attributes Voice Web White, Yellow and Order Pages 

and preferences page 501 contains preferences for the home ^.^^^^^ j^.^^ j appUcability of form based 

page 301, From the home page 301. the subscriber accesses ^^^^ publishing, a specific application of the 

the targeted voice web service page by navigating the ^^^^ form-based publishing is next described. The 

appropnate hyper links from the voice web home pa exemplary form based publishing process relates to the 

In response, the selected service ^ mvoked 803 and the bii^ationof voice web business white pages, yellow pages 

selected service then proceeds to deliver 804 the service, ^^^^ ^^^^^ ^ white-yellow-order 

Dunng invocation 803 of the sdected service, both the ^^^^ accordance with the present invention, 

service page and the attributes and preferences page asso- ^^.^^ ^^^^^ ^^.^^ '^^ ^^-^ p^g^3 

ciated with the service page are extracted by the service ^^^^ dynamicaUy composed by the voice web business 
^S^"^' 45 white pages agent 1003 from a business white page database 

During delivery 804 of the selected service, the service information including the name, address, phone num- 

agent uses the attributes and preferences page associated of businesses. The white pages agent 1003 presents a 

with the selected service to customize queries of the asso- search form to a caller for specifying the name of the 

ciated service database. More specifically, using the business and allows further narrowing of the search by city 

attributes and preferences information, the service agent ^jjj gt^te. Each business white page can be linked to a 

automatically fills in the needed fields in the corresponding corresponding business yellow page 1004. Business yellow 

query form with user specified defaults and preferences. ^^^^^ 1QQ4 contain additional information about the busi- 

Having fiUed the appropriate fields, the service agent plays including a tag line, advertisement, directions, working 

the remaining query form to the subscriber thereby greatly hours, and promotions. In addition, each yellow page 1004 

reducing the information that the subscriber has to supply on Mnkcd to a corresponding business order entry form 

the telephone. The service agent then obtains the remaining 10Q5 Business order entry forms 1005 allow usere to order 

information, if any, from the subscriber and submits the products and services or transact business by specifying 

query form to the service database. When the results are product or service codes, preferences, quantity, and credit 

returned (i.e. the information is retrieved from the service ^ard numbers for payment. 

database), the seivice agent plays the results to the sub- A participating business can publish a voice web yeUow 

scriber over the telephone. p^g^ simply filing a corresponding voice web 

Form Rased Voice Web Paee Publishine ^^^^^"^ P^^^ publishing form 1007. A yellow page publish- 

borm Based Voice Web Page FuDlisding ^^^^^ processes the yeUow page publishing form 

In another aspect of the invention, voice web system 100 1007 and dynamically generates a business yellow page 
enables publishers to compose voice web forms and pages 65 1004 for that business fi^om a standard yellow page template 

statically using ordinary word processing programs and link by replacing variables in the template with values supplied 

them to voice files created using ordinary audio capture and by the submitted yellow page publishing form. 
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The yellow page publishing agent 1006 (a publishing 
agent) presents a yellow page voice web publishing form 
1007 to the participating business. Voice web publishing 
forms are specially designed voice web forms that when 
interpreted (i.e. when played back) using the voice browser 
prompt the caller (the voice information publishers) to input 
voice and touch tone based input using a telephone. Yellow 
page publishing form 1007 guides the caller step by step to 
supply the needed information, edit and modify the infor- 
mation and finally submit the information for processing, as 
described in reference to FIG. 9. Specifically, yellow page 
publishing form 1007 prompts for voice information includ- 
ing name, tag line, advertisement, directions, working hours 
and promotions. In addition, the yellow page publishing 
agent 1006 prompts for touch tone input including the 
account number, password, phone number, yellow page 
category code and credit card number. Yellow page publish- 
ing agent 1006 uses the account number to identify the 
business, the password to verify the business, the phone 
number to hnk it to the corresponding white page, the yellow 
page category code to classify the business within business 
yellow pages, and the credit card number to pay for the 
business yellow page. Once the business is identified and 
verified, yellow page publishing agent 1006 dynamically 
creates a business yellow page 1004 from a standard tem- 
plate for the appropriate category. Yellow page publishing 
agent 1006 uses the suppUed business phone number to 
match with the appropriate database entry in the business 
white pages and updates it with the URL of the newly 
created yellow page to link it. 

A very similar process occurs for publishing order entry 
forms. A business order entry form publishing agent, order 
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page publishing agent 1008 presents an appropriate order 
entry publishing form 1009 to a participating business. 
Order page publishing agent 1008 requests for appropriate 
customized prompts for specific fields in the business order 
entry form such as product or service code, customer 
preferences, quantity, credit card number etc. Order page 
publishing agent 1008 also requests for touch lone input for 
the account number, password, phone number, and credit 
card number. Order page publishing agent 1008 uses the 
account number and password for identification and 
verification, the phone number to link it to the corresponding 
yellow page 1004 and the credit card number for payment 
for the order entry form. Once the business is identified and 
verified, order page publishing agent 1008 dynamically 
generates an order entry form for that business by filling the 
supplied information into a standard order entry template for 
that business category. Order page publishing agent 1008 
uses the suppUed business phone number to match with the 
appropriate database entry in the business white pages, 
updates it with the URL of the newly created order entry 
page, locates the corresponding yellow page using its URL 
in the database, and updates it to link to the newly created 
order entry page. 

The foregoing discussion discloses and describes merely 
exemplary embodiments of the present invention. As will be 
understood by those familiar with the art, the invention may 
be embodied in other specific forms without departing from 
the spirit or essential characteristics thereof. Accordingly, 
the disclosure of the present invention is intended to be 
illustrative, but not limiting, of the scope of the invention, 
which is set forth in the following claims. 



APPENDIX A 



I. HVML Specification 

Hyper Voice Markup Language consists of a set of extensions to existing HTML. Some 
of the extensions are new elements with new tag? and attributes. Others are extensions to 
existing elements in the fonn of new attributes. All attribute values are shown as %value 
type%. 

In-line Vbice conyoncpts 

The primary mechanism for introducing voice prompts into an HTML page is a new 
inline voice HVML element similar to the inline image HTML element. The tag for this 
element is "VOICE" and it has many variations. Each variation is specified by value of 
the TYPE attribute. Depending on the type, each variation has additional attributes. 
\bice Files 

<VOICE TYPE- "FUe" SRC- "%URL%" TEXT- "%tcxt%"> 

VOICE tag with TYPE set to "File" indicates a file containing pre-recorded voice 

information. It's attributes are SRC and TEXT. SRC attribute specifics the URL for the 

voice file and TEXT attribute, which is optional, specifies the text that cao be translated 

to speech as an alternative to the voice file. 

Vbice Index Files 

<VOICE TYPE- "Index" SRC- '*%URL%" INDEX- "%index%" TEXT- "%text%"> 
VOICE tag with TYPE set to "Index" indicates an indexed file containing pre-recorded 
voice phrases. It's attributes are SRC, INDEX and TEXT. SRC and TEXT have same 
meaning as in Vbice Files. The INDEX attribute specifies index of the phrase within the 
file either as a number or a label. 
For example: 

<VOICE TYPE- "File" SRO-"myweb/home/greeting.wav'*> 
Text-to-speech 

• <VOICE TYPE- "Tfext" TEXT- "%tcxt%"> 
VOICE tag with TYPE set to Text" indicates a text-to-speech string. It's attribute is 
TEXT which specifies the string that needs to be translated to speech. 
For example: 

<VOICE TYPE- "Text" TEXT-"Welcomc to your Home Page"> 
\bice Streams: 

<VOICE TYPE- "Stream" VALUE- "%URL%" TERMINATE- "%tone%"> 

VOICE tag with TYPE set to "Stream" indicates a continuous voice stream identified by 

its URU The browser accesses the voice stream and continuously plays it to the user. It's 
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attribute is TERMINATE which specifics the toae the user can enter to terminate the 

playback. 

Currency 

<VOICE TYPEo "Money" VALUE- "%numbcr%" FORMAT- *'%fonnat%"> 
VOICE tag with TYPE set to "Money" indicates a number that needs to be presented as 
currency. It's aUribules are VALUE and FORMAT VALUE specifics the decimal value 
of the number and FORMAT, which is optional, specifics the currency type such as "US 
Dollar", "British Pound" etc. The default value for FORMAT is "US Dollar". 
Numbers 

<VOICE TYPE- "Number" VALUE- "%Dumber%" FORMAT- "%format%"> 

VOICE tag with TYPE set to "Number" indicates a number that needs to be presented as 

Q decimal number. It's attributes are VALUE and FORMAT. VALUE specifies the 

decimal value and FORMAT, which is optional, specifics the precision to be conveyed. 

Digits after the decimal point are pronounced as characters. Default value for the 

FORMAT is 2 which indicates 2 digit precision after decimal point 

Characters 

<VOICE TYPE- "Character" VALUE- "%string%> 

VOICE tag with TYPE set to "Character** indicates a sequence of characters that arc to be 
presented separately with no pauses in between. It's attribute is VALUE which specifics 
the sequence of characters as string. 
Dates 

<VOICE TYPE- "Date" VALUE- "%date%" FORMAT- "%format%"> 

VOICE tag with TYPE set to "Date" indicates an expression that is to be presented as a 

date. It's attributes are VALUE and FORMAT. VALUE attribute specifies the expression 

and the FORMAT attribute, which is optional, specifies the format of the expression. 

Default format is MM/DD/YY. 

Ordinals 

<VOICE TYPE- '*Ordinfll" VALUE- "%number%"> 

VOICE tag with TYPE set to "Ordinal" indicates a number that is to be presented as an 
ordinal (Lc. as Nth value). It's attribute is VALUE which specifies the number. Values 
are pronounced as "first", "second", "third" etc 
Strings: 

<VOICESTRING NAME- "%namc%"> 
. . . Voice Components . . , 
</VOICESTRING> 

VOICESTRING tag indicates a sequence of voice components that are grouped together 
for presentation without any pauses in between. Each of the voice components can be 
any of the primitives previously defined. The voice browser gathers the individual 
components and plays them together in sequence. 
<Voicestring NAME- "wclcome"> 

<Vbice TYPE- "Index" SRC- "welcome.vap" INDEX- "begin" TEXT- "Welcome"> 

<\bice TYPE- "File" SRC- "usemame.vox" TEXT- "user's namc"> 

<Voice TYPE- "Index" SRC- "welcome.vap" INDEX- "end" TEXT- "to VOIS NET' 

<AfeiceString> 

The voice browser "plays" each in-line voice component in sequence as it encounters it in 
the HVML page starting from the beginning of the page. Each voice component is played 
only once for each presentation. A "reload" command would cause the voice browser to 
re-play the page. 

Of courcc, voice elements can also be invoked by hyper lints pointing to voice files 
containing digitized voice data. This is similar to existing HTML conventions. The voice 
browser simply fetches the new page and plays it once. In the next section, we will 
discuss how hyperlinks can be invoked using touch tone or key word input. 
Vbice responsive labels for hyper-links 

In order to invoke hyper links embedded in a HVML page, two new attributes "TONE" 
and "LABEL" are added to the anchor element. These attributes arc used in conjunction 
with the existing HREF attribute in an anchor element that makes the anchor into a hyper 
link. When the user selects the touch tone signals specified by the value of the TONE 
attribute followed by the tone or utters the word specified by the LABBL attribute, 
the browser invokes the corresponding hyper link. The TONE and LABEL attribute 
values must be unique within a page. 
For example: 

<A HREF-"mywcb/home/grecting.vml T0NE="HELLO"> 
or 

<A HREF-"mywcb/home/grceting.vml LABEL^"HELLO"> 

When the user presses "H,E,L,L,0,#" on the touch tone phone or the user says the 

word "HELLO" on the phone, the browser will invoke the corresponding hyper link and 

accesses the "greeting.vml" page. 

Keyword accessible indexes for anchors 

HTML allows the index access of fragments within a page by unique labels associated 
with anchors surrounding the fragment. The NAME attribute in an anchor elemem 
specifies a label that is unique within the page. This label can then be used as an index by 



07/23/2003, EAST Version: 



1.04.0000 



5,915,001 

25 26 

APPE^^DIX A-continued 



the browser to search for the fragment by matching the unique label with the one supplied 

in the hyperlink. The hyperlink for the indexed firagmcnt uses the regular URL for the 

page concatenated with the fragment's unique label with a separator. 

Coupled with voice responsive hyper links, fragment labels can be used to constmct 

single menus or database searches. 

For example: 

Suppose "mywcb/home^ronipts.vml" contains the following HVML text. 

<A NAME""promptl"> 

<VOICE TEXT="Prcss CAL# for Calendar"> 

<}A> 

<A NAME-"prompt2"> 

<VO[CE TEXT-'-Press ADDR# for Address Book"> 
<JA> 

<A NAME-''prompt3"> 

<VOICE TEXT="Prcss EMAIL for Electronic Mair'> 
<}A> 

Suppose another HVML page contains the fol lowing hyperlinks. 

<A HREF-"mywcb/home/prompts.vml#promptl" TONE-**l">Prcss 1 to hear 

Promptl</A> 

<A HREF""mywcb/home/prompta.vml^^rompt2" T0NE»"2">Press 2 to hear 
Prompt2^A> 

<A HREFo"myweb/homc/prompts.vml^^rompt3" T0NE="3"> Press 3 to hear 
Prompt3<;/A> 

Then, if the user presses the browser will fetch the "myweb/home/prompts.vml" 
HVML page, match "pron^tl" index with the first anchor's "promptl" label, and start 
presenting the prompts starting with text-to-speech translation of "Press CAl^ for 
Calendar". 
Browser Control 

<PAUSE TIMEOUT- "%seconds%" TERMINATE- "%tone%"> 
In order to let the voice page publisher to control the behavior of the voice browser, 
HVML defines a tag "Pause" with 'TIMEOUT* and ''TERMINATE" attributes. When 
the browser encounters a PAUSE statement, it pauses until either the amount of time 
specified in the TIMEOUT attribute elapses or the user enters the tone specified in the 
"TERMINATE" attribute. If the values of the TIMEOUT attribute is 0, then the browser 
waits there indefinitely. The default value for TIMEOUT is 1 second. Default value for 
TERMINATE is "r. 
Vfaice Responsive Forms 

HVML uses the FORM tag to enable user input similar to HTML including the 
METHOD attribute which specifies the way parameters are passed to the server and the 
ACTION attribute which specifies the procedure to be invoked by the server to process 
the form. HVML extends the INPUT tag within forms by introducing VOICEINPUT tag. 
VOICEINPUT takes a TYPE attribute similar to the INPUT tag with three new values 
"voice", "tone" and "review" in addition to the existing "reset" and "submit" values. 
The HVML browser pauses at each VOICEINPUT statement in a HVML form until the 
specified input is supplied or input is terminated before processing the remaining form. 
The VOICEINPUT tag with TYPE value set to "voice" indicates a form that accepts 
voice input. Usually, a voice prompt or text-to-speech segment precedes the 
VOICEINPUT tag alerting the user that input is required and how to terminate input. The 
user is expected to speak and this message is recorded in real-time and supplied to the 
Vbice Web server for processing. 'The VOICEINPUT tag containing "voice" value for the 
TYPE attribute also supports a MAXTIME attribiUe which specifies the maximum 
recording time for the message and a TERMINATE attribute which specifies the touch 
tone that terminates input. If the MAXTIME attribute is not specified, then the defeult 
value of "15" is assumed. If TERMINATE attribute is not specified, then the default 
value of "#" is assumed. For example, if the MAXTIME value is 20 and TERMINATE 
value is then recording terminates when the user presses "#** or 20 seconds of time 
elapses. 

The VOICEINPUT tag with TYPE value set to "tone" indicates a form that accepts touch 
tone input. Again, a voice prompt or a texl-to-speech segment precedes the 
VOICEINPUT tag alerting the user for input. The user is expected to press a sequence of 
touch tones which are recorded and supplied to the \bice Web server for processing. The 
VOICEINPUT tag containing "tone" value for the TYPE attribute also supports a 
MAXDIGITS attribute which specifies the maximum number of touch tone digits that 
can be supplied and a TERMINATE attribute which specifies the touch tone that 
terminates input. If the MAXDIGITS attribute is not specified, then the default value of 
"20** is assumed. If TERMINATE attribute is not specified, then the defeult value of 
is assumed. For example, if the MAXDIGITS value is 10 and TERMINATE value is "#', 
then input process terminates when the user presses or 10 digits are supplied. 
The VOICEINPUT tag with TYPE value set to "review" indicates that the current values 
of the form can be reviewed by selecting the "review" input The VOICEINPUT tag with 
TYPE value set to "reset" indicates that the current values of the form should be reset to 
their original defaults. The VOICEINPUT tag with TYPE value set to "submit" indicates 
that the cunent form should be submitted to the server. Each of these three TYPE values 
support a SELECTTONES attribute and a SKIPTONES attribute. SELECTTONES 
attribute specifies the sequence of touch tones that activates the corresponding selection. 
SKIPTONES attribute specifies the sequence of touch tones that skips the selection. If the 
SELECTTONES auribute is not specified, then the default value of is assumed and 
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if the SKIPTONES attribute is not specified, then the default value of is assumed. 
For example, if the SELECTTONES attribute value is "REVIEW" and SKIPTONES 
attribute value is "SKIP" for a VOICEINPUT element with TYPE value set to "review", 
the user can eater "REVIEW" to review the form values or enter "SKIP" to skip the 
selection. VOICEINPUT tag with TYPE value set to "submit" similarly indicates the 
values of the form can be submitted to the ser\'er. If the SELECTTONES attribute value 
is "DONE" and the SKIPTONES attribute value is the user can either enter 
"DONE" to submit the form or press "*•" to skip the selection. VOICEINPUT tag with 
TYPE value set to "reset" similarly indicates that the values of the form be reset to their 
original values. 

n. Voice Browser Commands 

All browser commands must start with the key. Each browser command is associated 
with one or more key words that uniquely identify it. For example, in order to activate 
"Home" command, the user would press "•home" on the telephone key pad. The key 
words arc chosen in such a way to generate unique dial tone sequences. A set of default 
browser commands arc listed below with the keyword and description of the command. 
Alternatively, the browser commands can also be issued by vocalizing the corresponding 
commands. For example, to activate the "Home" conunand, the user would say "home" 
on the telephone. 
Previous 

Jump to the previous page from which the current page was accessed via a hyper 
link. This command is activated by pressing "*pr" (*71) or "*prev" (*7738) 
sequence. 
Next 

Jump to the next page in a sequence of hyper links. This command is activated by 

pressing "'n" {*S) or "next" (*639S) sequence. 

History 

Present the titles of the pages accessed so far in the order of their hyper link 
access sequence. Pause after each title. If the user presses "#", then jump to the 
page specified by the title. If not, proceed to the next title. This command is 
activated by pressing "'hi" ("44) or ""hist'* (4478) sequence. 
Home 

Jump to the first page in the sequence of hyper links. This command is activated 

by pressing "*ho" ('46) or "*homc" (•4663) sequence. 

Reload 

Reload the current page again from the Web server. This command is activated by 

pressing ""re" (*73) or "*relo" *(7356) sequence. 

Help 

Jump to the home page of the help page set. Help pages are navigated in exactly 
the same way as ordinary HVML pages. However, a new browser instance is, 
created on activation which must be "exited" to get back to the page context from 
which "Help" page set was accessed. This command is activated by pressing "*h" 
("4) or "*hc]p" ('4357) sequence. 
Fax 

Jump to the home page of the Fax dialog session using HTML forms. Again, a 
new browser instance is created on activation which must be "exited" to get back 
to the page context from which "Fax" dialog session was activated. This 
command is activated by pressing "*fa" (*32) "'fax" (*329) sequence. 
Stop 

Stop loading the page that is currently being accessed. This conmtand is activated 

by pressing "*t" (*8) or ""stop" (•7867) sequence. 

Exit 

Exit the current instance of the browser and return to the page being accessed in 
the previous instance of the browser. If this is the first instance of the browser, 
then exit the browser and hang-up the phone. This command is activated by 
pressing "^x" ("9) or "'exit" ('3948) sequence. 
Bookmarks 

Present the titles of the pages selected as bookmarks in the order of their hyper 
link access sequence. Pause after each title. If the user presses then jump to 
the page specified by the title. If not, proceed to the next title. This command is 
activated by pressing "*bo" (*26) or ""book" (*2665) sequence. 

m. M)ice Browser Playback Controls 

When the Vbice browser is activated to play back voice prompts or speech segments, an 
additional set of brov^'ser commands are available to the user to control the playback. 
Pause 

Pause the play back at current position. This command is activated by pressing 
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"■p" (•?) or —pause" (•72873). 
Play 

Continue play back from current position. This command is activated by pressing 

"•p" i*r) or -play" C*7529). 

Backup 

Back up the play back position by S seconds and start play back. The command is 
activated by pressing "*b'* (•2) or "•back" (•2225). Repeated pressing of the 
same tone implies successive back up by 5 seconds for each tone. 
Forward 

Forward the play back position by 5 seconds and start play back. The command is 
activated by pressing "*f' ('3) or "•frwd" ('3793). Repeated pressing of the same 
tone implies successive skip forward by 5 seconds for each tone. 
Start 

Back up the play back position to the beginning of the play back sequence and 

start play back. The command is activated by pressing "*0". 

End 

Jump to the end of the play back sequence, backup by 5 seconds and start play 
back. The command is activated by pressing 



What is claimed is: 

1. A method of delivering caller-customized voice-based 
information to a caller, comprising: 

storing caller-specific information in a computer file at a 
universal resource locator (URL): 
determining a URL associated with the caller; 
retrieving the caller-specific information using the 30 
URL; 

processing at least one caller command received over 
the telephone lo determine a service request; 

retrieving information responsive to the service request 
and responsive to the caller-specific information, 
including; 

generating a database query form responsive to the 

service request; 
customizing the database query form using the 

caller-specific information; and 
performing a database search using the query form, *° 
wherein generating a database query form respon- 
sive to the service request includes: 
storing a voice form associated with the service 

request at a universal resource locator (URL) 

address in the computer network wherein the 45 

voice form is stored in a markup language; 
playing the voice form to the caller to generate at 

least one information prompt for the caller; 
collecting information from the caller in response 

to each prompt; and 50 
generating a database query form using at least a 

portion of the collected information; and 
playing back the retrieved information to the 

caller over the telephone. 

2. The method of claim 1 wherein collecting information 
from the caller in response to each prompt includes collect- 
ing touch tone inputs from the caller. 

3. The method of claim 1 wherein collecting information 
from the caller in response to each prompt includes collect- 
ing voice command inputs from the caller and performing 
speech recognition on the voice command inputs, 

4. A method of processing voice-based information 
received from a telephone caller over a computer network, 
comprising: 

storing a voice form at a universal resoiu-ce locator (URL) 
address in the computer network wherein the voice 65 
form is stored in a markup language with voice exten- 
sions; and 



during a calling session: 

playing the voice form to the caller to generate at least 

one information prompt to the caller; 
collecting information from the caller in response to 

each prompt; and 
storing the collected information in a first markup 

language document and including in the document a 

hyperlink to a second markup language document. 

5. The method of claim 4 wherein the hyperlink is 
determined responsive to at least a portion of the collected 
information. 

6. A method of processing voice-based information 
received from a telephone caller over a computer network, 
comprising; 

storing a voice form at a universal resource locator (URL) 
address in the computer network wherein the voice 
form is stored in a markup language with voice exten- 
sions; and 
during a calling session: 

playing the voice form to the caller lo generate at least 

one information prompt for the caller; 
collecting information from the caller in response to 

each prompt; and 
storing the collected information in a first markup 

language document and including in a second 

markup language document a hyperlink lo the first 

markup language document, 

7. The method of claim 6 wherein the hyperlink is 
determined responsive to at least a portion of the collected 
information. 

8. A system for delivering information over a telephone, 
comprising: 

a business white pages database including business name, 

address and phone number information; 
a database query fonn; 
a first processing agent programmed to: 

collect user information xising a voice based telecom- 
munications device; 
include at least some of the collected information to the 

database query form; 
search the database by applying the database query 

form to the database to retrieve information; and 
generate a voice web page having a universal resource 
locator (URL) address using the retrieved informa- 
tion; 
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a yellow page database including business advertising 
information; and 

a second processing agent wherein the voice web page 
generated by the first processing agent includes a 
hyperlink to the second processing agent and wherein 
the second processing agent is programmed to: 
search the yellow page database to retrieve informa- 
tion; and 

generate a voice web page using the retrieved infor- 
mation; and 

a voice web browser adapted to play voice web pages 
to a user. 
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10. The system of claim 8 further comprising: 

an order page database including business order informa- 
tion; and 

a third processing agent wherein the voice web page 
generated by the second processing agent includes a 
second hyperlink to the third processing agent and 
wherein the third processing agent is programmed to: 
search the order page database to retrieve information; 
and 

generate a voice web page using the retrieved infor- 
mation. 

11. The system of claim 10 wherein the second hyperlink 
identifies an entry in the order page database and wherein 
searching the order page database comprises locating the 



9. The system of claim 8 wherein the hyperlink identifies 
an entry in the yellow page database and wherein searching 15 order page database entry identified by the hyperlink, 
the yellow page database comprises locating the yellow page 
database entry identified by the hyperlink. 
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